一种基于网页源文件的信息提取算法

doi:10.3969/j.issn.1006-2475.2012.02.011

计算机与现代化 ›› 2012, Vol. 198 ›› Issue (2): 38-39.doi: 10.3969/j.issn.1006-2475.2012.02.011

一种基于网页源文件的信息提取算法

赵晓峰, 凌天斌, 彭波, 王转妮

解放军外国语学院教育技术中心,河南洛阳 471003

收稿日期:2011-08-29 修回日期:1900-01-01 出版日期:2012-02-24 发布日期:2012-02-24

An Algorithm of Drawing Website Information Based on Webpage File Code

ZHAO Xiao-feng, LING Tian-bin, PENG Bo, WANG Zhuan-ni

Education Technology Center, Foreign Languages College of Chinese People’s Liberation Army， Luoyang 471003, China

Received:2011-08-29 Revised:1900-01-01 Online:2012-02-24 Published:2012-02-24

摘要/Abstract

摘要： 通过对网页源文件的代码进行分析、设计信息提取的算法，目的是替代人工进行网站相关信息的获取，避免重复性劳动。首先对现有的两种Web结构进行比较分析，然后针对每一种Web结构提出信息提取的方案，接下来以日本著名新闻网站NHK为例，对上述方案进行验证和代码实现，最后对系统的功能扩充进行更高层次的展望。

关键词: Web结构, 信息提取, 网页标记

Abstract: This paper designs an algorithm of drawing information through the analysis of webpage file code. The purpose of this paper is to obtain the website information automatically. First, it analyzes and ampares two kinds of website structure, then proposes the algorithm of drawing information on the two website structure, following this, realizes the algorithm with code taking the NHK website as an example, at the end, expects the information drawing system’s future on function expansion.

Key words: Web struction, information drawing, webpage mark

中图分类号:

TP301.6

赵晓峰;凌天斌;彭波;王转妮. 一种基于网页源文件的信息提取算法[J]. 计算机与现代化, 2012, 198(2): 38-39.

ZHAO Xiao-feng;LING Tian-bin;PENG Bo;WANG Zhuan-ni. An Algorithm of Drawing Website Information Based on Webpage File Code[J]. Computer and Modernization, 2012, 198(2): 38-39.

[1]	樊海玮, 秦佳杰, 孙欢, 张丽苗, 鲁芯丝雨. 基于BERT与BiGRU-CRF的交通事故文本信息提取模型[J]. 计算机与现代化, 2022, 0(05): 10-15.
[2]	李盼1，李宜广2，徐春1. 基于关键节点的网络热点信息抽取[J]. 计算机与现代化, 2019, 0(09): 60-.
[3]	林波1，林伟佳2，郭靖羽1，丁东辉2，黄翰2. 基于双层语料过滤器的短语抽取方法[J]. 计算机与现代化, 2015, 0(12): 7-.
[4]	李海胜;周萍;韩孟啸;霍红元;崔艳梅;耿令朋. 基于光谱吸收指数的矿物识别软件模块研发[J]. 计算机与现代化, 2012, 198(2): 69-72.
[5]	侯明亮. 基于分数阶微分的高斯噪声图像信息提取算法[J]. 计算机与现代化, 2010, 1(9): 5-8.
[6]	周炘;邓蓉. 基于XML的Web数据挖掘模型设计与研究[J]. 计算机与现代化, 2010, 1(11): 60-62.

一种基于网页源文件的信息提取算法

An Algorithm of Drawing Website Information Based on Webpage File Code

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 6

编辑推荐

Metrics

本文评价